94 research outputs found

    AcListant with Continuous Learning: Speech Recognition in Air Traffic Control (EIWAC 2019)

    Get PDF
    Increasing air traffic creates many challenges for air traffic management (ATM). A general answer to these challenges is to increase automation. However, communication between air traffic controllers (ATCos) and pilots is widely analog and far away from digital ATM components. As communication content is important for the ATM system, commands are still entered manually by ATCos to enable the ATM system to consider the communication. However, the disadvantage is an additional workload for the ATCos. To avoid this additional effort, automatic speech recognition (ASR) can automatically analyze the communication and extract the content of spoken commands. DLR together with Saarland University invented the AcListant® system, the first assistant based speech recognition (ABSR) with both a high command recognition rate and a low command recognition error rate. Beside the high recognition performance, AcListant® project revealed shortcomings with respect to costly adaptations of the speech recognizer to different environments. Machine learning algorithms for the automatic adaptation of ABSR to different airports were developed to counteract this disadvantage within the Single European Sky ATM Research Programme (SESAR) 2020 Exploratory Research project MALORCA. To support the standardization of speech recognition in ATM, an ontology for ATC command recognition on semantic level was developed to enable the reuse of expensively manually transcribed ATC communication in the SESAR Industrial Research project PJ.16-04. Finally, results and experiences are used in two further SESAR Wave-2 projects. This paper presents the evolution of ABSR from AcListant® via MALORCA, PJ.16-04 to SESAR Wave-2 projects

    Early Callsign Highlighting using Automatic Speech Recognition to Reduce Air Traffic Controller Workload

    Get PDF
    The primary task of an air traffic controller (ATCo) is to issue instructions to pilots. However, the first contact is often initiated by the pilot. It is useful to have a controller assistance system, which could recognize and highlight the spoken callsign as early as possible, directly from the speech data. Therefore, we propose to use an automatic speech recognition (ASR) system to obtain the speech-to-text translation, using which we extract the spoken callsign. As a high callsign recognition performance is required, we use surveillance data, which significantly improves the performance. We obtain callsign recognition error rates of 6.2% and 8.3% for ATCo and pilot utterances, respectively, but can improve to 2.8% and 4.5%, when using information from surveillance dat

    ATTENTION: TARGET AND ACTUAL – THE CONTROLLER FOCUS

    Get PDF
    The main task of an air traffic controller (ATCO) is to ensure safe and efficient air traffic control (ATC). Therefore, the ATCO needs to have his/her attention at the right place at the right time on the controller working position’s displays. This will be even more challenging in the future with increasing information diversity, growing levels of automation, more complex air traffic mix, new technologies, and bigger screens. However, to deal with these challenges an attention guiding assistance system is developed to support the ATCO. This system needs to determine the area of target attention due to relevant upcoming ATC events. It should also evaluate the current area of attention as a function of the ATCO's gaze, e.g., via eye-tracking, and evaluate it. If there is a mismatch between target and actual area of attention, the attention focus of the ATCO has to be appropriately guided to relevant areas via cues. Based on an analysis of attention and situation awareness, attention guidance mechanisms have been developed and successfully validated in human-in-the-loop trials. ATCOs felt well-supported by visual non-intrusive guidance cues and even wanted to have such functionality in today’s working positions

    Automatic Speech Analysis Framework for ATC Communication in HAAWAII

    Get PDF
    Over the past years, several SESAR funded exploratory projects focused on bringing speech and language technologies to the Air Traffic Management (ATM) domain and demonstrating their added value through successful applications. Recently ended HAAWAII project developed a generic architecture and framework, which was validated through several tasks such as callsign highlighting, pre-filling radar labels, and readback error detection. The primary goal was to support pilot and air traffic controller communication by deploying Automatic Speech Recognition (ASR) engines. Contextual information (if available) extracted from surveillance data, flight plan data, or previous communication can be exploited via entity boosting to further improve the recognition performance. HAAWAII proposed various design attributes to integrate the ASR engine into the ATM framework, often depending on concrete technical specifics of target air navigation service providers (ANSPs). This paper gives a brief overview and provides an objective assessment of speech processing components developed and integrated into the HAAWAII framework. Specifically, the following tasks are evaluated w.r.t. application domain: (i) speech activity detection, (ii) speaker segmentation and speaker role classification, as well as (iii) ASR. To our best knowledge, HAAWAII framework offers the best performing speech technologies for ATM, reaching high recognition accuracy (i.e., error-correction done by exploiting additional contextual data), robustness (i.e., models developed using large training corpora) and support for rapid domain transfer (i.e., to new ATM sector with minimum investment). Two scenarios provided by ANSPs were used for testing, achieving callsign detection accuracy of about 96% and 95% for NATS and ISAVIA, respectively

    Brain–Computer Interface-Based Adaptive Automation to Prevent Out-Of-The-Loop Phenomenon in Air Traffic Controllers Dealing With Highly Automated Systems

    Get PDF
    International audienceIncreasing the level of automation in air traffic management is seen as a measure to increase the performance of the service to satisfy the predicted future demand. This is expected to result in new roles for the human operator: he will mainly monitor highly automated systems and seldom intervene. Therefore, air traffic controllers (ATCos) would often work in a supervisory or control mode rather than in a direct operating mode. However, it has been demonstrated how human operators in such a role are affected by human performance issues, known as Out-Of-The-Loop (OOTL) phenomenon, consisting in lack of attention, loss of situational awareness and de-skilling. A countermeasure to this phenomenon has been identified in the adaptive automation (AA), i.e., a system able to allocate the operative tasks to the machine or to the operator depending on their needs. In this context, psychophysiological measures have been highlighted as powerful tool to provide a reliable, unobtrusive and real-time assessment of the ATCo’s mental state to be used as control logic for AA-based systems. In this paper, it is presented the so-called “Vigilance and Attention Controller”, a system based on electroencephalography (EEG) and eye-tracking (ET) techniques, aimed to assess in real time the vigilance level of an ATCo dealing with a highly automated human–machine interface and to use this measure to adapt the level of automation of the interface itself. The system has been tested on 14 professional ATCos performing two highly realistic scenarios, one with the system disabled and one with the system enabled. The results confirmed that (i) long high automated tasks induce vigilance decreasing and OOTL-related phenomena; (ii) EEG measures are sensitive to these kinds of mental impairments; and (iii) AA was able to counteract this negative effect by keeping the ATCo more involved within the operative task. The results were confirmed by EEG and ET measures as well as by performance and subjective ones, providing a clear example of potential applications and related benefits of AA

    Customization of Automatic Speech Recognition Engines for Rare Word Detection Without Costly Model Re-Training

    Get PDF
    Thanks to Alexa, Siri or Google Assistant automatic speech recognition (ASR) has changed our daily life during the last decade. Prototypic applications in the air traffic management (ATM) domain are available. Recently pre-filling radar label entries by ASR support has reached the technology readiness level before industrialization (TRL6). However, seldom spoken and airspace related words relevant in the ATM context remain a challenge for sophisticated applications. Open-source ASR toolkits or large pre-trained models for experts - allowing to tailor ASR to new domains - can be exploited with a typical constraint on availability of certain amount of domain specific training data, i.e., typically transcribed speech for adapting acoustic and/or language models. In general, it is sufficient for a "universal" ASR engine to reliably recognize a few hundred words that form the vocabulary of the voice communications between air traffic controllers and pilots. However, for each airport some hundred dependent words that are seldom spoken need to be integrated. These challenging word entities comprise special airline designators and waypoint names like "dexon" or "burok", which only appear in a specific region. When used, they are highly informative and thus require high recognition accuracies. Allowing plug and play customization with a minimum expert manipulation assumes that no additional training is required, i.e., fine-tuning the universal ASR. This paper presents an innovative approach to automatically integrate new specific word entities to the universal ASR system. The recognition rate of these region-specific word entities with respect to the universal ASR increases by a factor of 6

    Ensuring Safety for Artificial-Intelligence-Based Automatic Speech Recognition in Air Traffic Control Environment

    Get PDF
    This paper describes the safety assessment conducted in SESAR2020 project PJ.10-W2-96 ASR on automatic speech recognition (ASR) technology implemented for air traffic control (ATC) centers. ASR already now enables the automatic recognition of aircraft callsigns and various ATC commands including command types based on controller–pilot voice communications for presentation at the controller working position. The presented safety assessment process consists of defining design requirements for ASR technology application in normal, abnormal, and degraded modes of ATC operations. A total of eight functional hazards were identified based on the analysis of four use cases. The safety assessment was supported by top-down and bottom-up modelling and analysis of the causes of hazards to derive system design requirements for the purposes of mitigating the hazards. Assessment of achieving the specified design requirements was supported by evidence generated from two real-time simulations with pre-industrial ASR prototypes in approach and en-route operational environments. The simulations, focusing especially on the safety aspects of ASR application, also validated the hypotheses that ASR reduces controllers’ workload and increases situational awareness. The missing validation element, i.e., an analysis of the safety effects of ASR in ATC, is the focus of this paper. As a result of the safety assessment activities, mitigations were derived for each hazard, demonstrating that the use of ASR does not increase safety risks and is, therefore, ready for industrialization

    Grammar Based Speaker Role Identification for Air Traffic Control Speech Recognition

    Get PDF
    Automatic Speech Recognition (ASR) for air traffic control is generally trained by pooling Air Traffic Controller (ATCO) and pilot data. In practice, this is motivated by the proportion of annotated data from pilots being less than ATCO’s. However, due to the data imbalance of ATCO and pilot and their varying acoustic conditions, the ASR performance is usually significantly better for ATCOs speech than pilots. Obtaining the speaker roles requires manual effort when the voice recordings are collected using Very High Frequency (VHF) receivers and the data is noisy and in a single channel without the push-totalk (PTT) signal. In this paper, we propose to (1) split the ATCO and pilot data using an intuitive approach exploiting ASR transcripts and (2) consider ATCO and pilot ASR as two separate tasks for Acoustic Model (AM) training. The paper focuses on applying this approach to noisy data collected using VHF receivers, as this data is helpful for training despite its noisy nature. We also developed a simple yet efficient knowledgebased system for speaker role classification based on grammar defined by the International Civil Aviation Organization (ICAO). Our system accepts as input text, thus, either gold annotations or transcripts generated by an ABSR system. This approach provides an average accuracy in speaker role identification of 83%. Finally, we show that training AMs separately for each task, or using a multitask approach, is well suited for the noisy data compared to the traditional ASR system, where all data is pooled together for AM training

    How to Measure Speech Recognition Performance in the Air Traffic Control Domain? The Word Error Rate is only half of the truth

    Get PDF
    Applying Automatic Speech Recognition (ASR) in the domain of analogue voice communication between air traffic controllers (ATCo) and pilots has more end user requirements than just transforming spoken words into text. It is useless, when word recognition is perfect, as long as the semantic interpretation is wrong. For an ATCo it is of no importance if the words of greeting are correctly recognized. A wrong recognition of a greeting should, however, not disturb the correct recognition of e.g. a “descend” command. Recently, 14 European partners from Air Traffic Management (ATM) domain have agreed on a common set of rules, i.e., an ontology on how to annotate the speech utterance of an ATCo. This paper first extends the ontology to pilot utterances and then compares different ASR implementations on semantic level by introducing command recognition, command recognition error, and command rejection rates. The implementation used in this paper achieves a command recognition rate better than 94% for Prague Approach, even when WER is above 2.5
    • …
    corecore